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Abstract 

We investigate the transportation cost-information inequalities for 
bifurcating Markov chains which are a class of processes indexed by 
binary tree. These processes provide models for cell growth when 
each individual in one generation gives birth to two offsprings in the 
next one. Transportation cost inequalities provide useful concentra¬ 
tion inequalities. We also study deviation inequalities for the empiri¬ 
cal means under relaxed assumptions on the Wasserstein contraction 
of the Markov kernels. Applications to bifurcating non linear autore¬ 
gressive processes are considered: deviation inequalities for pointwise 
estimates of the non linear leading functions. 

Keywords: Transportation cost-information inequalities, Wasserstein distance, bi¬ 
furcating Markov chains, deviation inequalities, geometric ergodicity. 


1 Introduction 

Roughly speaking, a bifurcating Markov chain is a Markov chain indexed 
by a binary regular tree. This class of processes are well adapted for the 
study of populations where each individual in one generation gives birth 
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to two offsprings in the next one. They were introduced by Guyon [2£ 
in order to study the Escherichia coli aging process. Namely, when a cell 
divides into two offsprings, are the genetical traits identical for the two 
daughter cells? Recently, several models of bifurcating Markov chains, or 
models using the theory of bifurcating Markov chains, for example under 
the form of bifurcating autoregressive processes, have been studied [l|, Q, [H, 
showing that these processes are of great importance to analysis 
of cell division. There is now an important literature covering asymptotic 
theorems for bifurcating Markov chains such as Law of Large Numbers, 
Central Limit Theorems, Moderate Deviation Principle, Law of Iterated 
Logarithm, see for example 0, [13, S, E3, [13, [l3 . for recent references. 
These limit theorems are particularly useful when applied to the statistics 
of the bifurcating processes, enabling to provide efficient tests to assert if 


the aging of the cell is different for the two offsprings (see 30(| for real 


case study). Of course, these limit theorems may be considered only in the 
’’ergodic” case, i.e. when the law of the random lineage chain has an unique 
invariant measure. 

However, limit theorems are only asymptotical results and one is often 
faced to study only datas with a size limited population. It is thus very 
natural to control the statistics non asymptotically. Such deviation inequal¬ 
ities (or concentration inequalities) have been recently the subject of many 
studies and we refer to the books of Ledoux 311] and Massart 35|] for nice 
introductions on the subject, developing both i.i.d. case and dependent 
case with a wide variety of tools (Laplace controls, functional inequalities, 
Efron-Stein,...). It was one of the goal of Bitseki et al. [1] to investigate 
deviation inequalities for additive functionals of bifurcating Markov chain. 
In their work, one of the main hypothesis is that the Markov chain as¬ 
sociated to a random lineage of the population is uniformly geometrically 
ergodic. It is clearly a very strong assumption, nearly reducing interesting 
models to the compact case. The purpose of this paper is to considerably 
weaken this hypothesis. More specifically, our aim is to obtain deviation 
inequalities for bifurcating Markov chain when the auxiliary Markov chains 
may satisfy some contraction properties in Wasserstein distance, and some 
(uniform) integrabilty property. This will be done with the help of trans¬ 
portation cost-information inequalities and direct Laplace controls. In order 
to present our result, we may now define properly the model of bifurcating 
Markov chains. 
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1.1 Bifurcating Markov chains 

First we introduce some useful notations. Let T be a regular binary tree in 
which each vertex is seen as a positive integer different from 0. For r G N, 
let 

r 

G, = {2'-,2" + l,--- ,2"+i-l}, T, = |jGg, 

q=0 

which denote respectively the r-th column and the first (r + 1) columns of 
the tree. The whole tree is thus defined by 

oo 

T = IJ G^. 

r=0 


A column of a given vertex n is Gr„ with = [log 2 raj, where [xj denotes 
the integer part of the real number x. 

In the sequel, we will see T as a given population in which each individual 
in one generation gives birth to two offsprings in the next one. This will make 
easier the introduction of different notions. The vertex ra will denote the 
individual ra and the ancestor of individuals 2ra and 2ra + 1 . The individuals 
who belong to 2N (resp. 2N+1) will be called individual of type 0 (resp. type 
1). The column G^ and the first (r + 1) columns T,. will denote respectively 
the r-th generation and the first (r -|- 1) generations. The initial individual 
will be denoted 1. 

For each individual ra, we look into a random variable X^, defined on 
a probability space (If, X, P) and which takes its values in a metric space 
{E,d) endowed with its Borel cr-algebra 8. We assume that each pair of 
random variables {X 2 n, X 2 n+i) depends of the past values {Xm,m G Tr„) 
only through In order to describe this dependance, let us introduce the 
following notion. 

Definition 1.1 (T-transition probability, see (01)). We call T-transition 
probability any mapping P : FI x —>• [0,1] such that 

• P{-,A) is measurable for all A G 

• P{x, •) is a probability measure on {E‘^,£‘^) for all x £ E. 

In particular, for all x,y,z G E, P{x, dy, dz) denotes the probability 
that the couple of the quantities associated with the children are in the 
neighbourhood of y and 2 given that the quantity associated with their 
mother is x. 
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For a T-transition probability P on Ex we denote by Pq, Pi the first 
and the second marginal of P, that is Po{x,A) = P{x,A x E), Pi{x,A) = 
P{x,E X A) for all X G Fi and A^ £. Then, Pq (resp. Pi) can be seen as 
the transition probability associated to individual of type 0 (resp. type 1). 

For p > 1, we denote by B{E'p) (resp. Bi){EP)), the set of all £P- 
measurable (resp. T^-measurable and bounded) mappings / : E^ —)■ M. 
For / G B{E^), we denote by Pf G B{E) the function 

xt-^Pf{x) = / f{x,y,z)P{x,dy,dz), when it is dehned. 

Js^ 

We are now in position to give a precise definition of bifurcating Markov 
chain. 

Definition 1.2 (Bifurcating Markov Chains, see ([^)). Let (X„,n G T) 
be a family of E-valued random variables defined on a filtered probability 
space (fi,T', (J>,r G N),P). Let v be a probability on {E,£) and P be a T- 
transition probability. We say that {Xn,n £T) is a (Pr)-bifurcating Markov 
chain with initial distribution v and T-transition probability P if 

• Xn is Pm-f^^^surable for all n G T, 

. £(Xi) = i/, 

• for a/Z r G N and for all family {fnP G Gr) LL Bb{E^) 


E 


H fniXn,X2n,X2n+l) 




Pr 


n PfniXn). 


In the following, when unprecised, the hltration implicitly used will be 
Pr = a{Xi,i G Tr). 

Remark 1.3. We may of course also consider in this work bifurcating Markov 
chains on a a-ary tree (with a > 2) with no additional technicalities, but 
heavy additional notations. In the same spirit, Markov chains of higher 
order (such as BAR processes considered in j^) could be handled by the 
same techniques. A non trivial extension would be the case of bifurcating 
Markov chains on a Galton-Watson tree (see for example 0 under very 
strong assumptions), that we will consider elsewhere. 
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1.2 Transportation cost-information inequality 

We recall that {E, d) is a metric space endowed with its Borel cr-algebra £. 
Given p > 1, the L^-Wasserstein distance between two probability measures 
/i and ly on E is defined by 

Wp{iy,p) = ini J d{x,yfdTT{x,y)^ 


where the infimum is taken over all probability measures vr on the product 
space E X E with marginal distributions y and u (say, coupling of {y,iy)). 
This infimum is finite as soon as p and v have finite moments of order p. 
When d{x,y) = 'i-xyty (the trivial measure), 2Wi{p,v) = \\pL — the 

total variation oi p — u. 

The Kullback information (or relative entropy) of v with respect to p is 
defined as 

//(./,) = log.f-«. 

1+00 else. 

Definition 1.4 (L^-transportation cost-inequality). We say that the prob¬ 
ability measure p satisfies the -transportation cost-information inequality 
on {E,d) (and we write p G Tp{C)) if there is some constant C > 0 such 
that for any probability measure v, 


W^{p,v) < ^2 CH{uIp). 

This transportation inequality have been introduced by Marton 
as a tool for (Gaussian) concentration of measure property. The follow¬ 
ing result will be crucial in the sequel. It gives a characterization of L^- 
transportation cost-inequality in term of concentration inequality. It is of 
course one of the main tool to get deviation inequalities (via Markov in¬ 
equality) . 

Theorem 1.5 ([lH). p satisfies the -transportation cost-information in¬ 
equality (say Ti) on {E,d) with constant C > 0, that is, p G Ti(C'), if and 
only if for any Lipschitzian function E : {E, d) —>■ M, F is p-integrable and 


L 


exp (A {F - {F)p)) dp < exp 


Y^wnlp 


VA G 


where {F)p = J^^Edp and 


||F||„,=.upkM^<+oo. 

xj^y d{x,y) 
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In particular, we have the concentration inequality 


/i (F - {F)f, <-t)\Jp{F- (F)^ > t) < exp - 


^C\\F\\l^^ 


yt e 


In this work we will focus on transportation inequality Ti mainly. There 
is now a considerable literature around these transportation inequalities. As 
a flavor, let us cite first the characterization of Ti as a Gaussian integrability 
property (see also 

Theorem 1.6 ([2fll|). /i satisfies the -transportation cost-information in¬ 
equality (say Ti) on {E, d) if and only if there exists 5 > 0 and xq & E such 
that 

jF(x,xo)\ ^ 


and the constant of the Transportation inequality can he made explicit. 
There is also a large deviations characterization 


25| . Recent striking 


results on transportation inequalities have been obtained for T 2 , namely 
that they are equivalent to dimension free Gaussian concentration 12411 . or 
to a restricted class of logarithmic Sobolev inequalities 271] . Se also [1^ or 
14l | for practical criterion based on Lyapunov type criterion and we refer for 
example to [ 2 ^ or 371 for surveys on transportation inequality. One of the 
main aspect of transportation inequality is their tensorization property, i.e. 

satisfy some transportation measure if p. does (with dependence on 
the dimension . One important development was to consider such a 

property for dependent sequences such as Markov chains. In ( 2 ^ . Djellout et 
al ., generalizing result of Marton 3^ , have provided conditions under which 
the law of a homogeneous Markov chain {Yk)i<k<n on F” satisfies the L^- 
transportation cost-information inequality Tp with respect to the metric 

/ n \ i/P 

dlj,{x,y) := [ 


We will follow similar ideas here to establish the L^- transportation cost- 
information inequality for the law of a bifurcating Markov chain (Aj)i<j<Ar 
on . This will allow us to obtain concentration inequalities for bifurcat¬ 
ing Markov chains under hypotheses largely weaker than those of Bitseki 
et al. 0. It would also be tempting to generalize the approach of [ 2 ^ to 
Markov chains and bifurcating Markov chains to get directly deviation in¬ 
equalities for Markov chains, w.r.t. the invariant measure. However it would 
need to restrict to reversible Markov chains and thus not directly suited to 
bifurcating Markov chains and would thus recquire new ideas. 
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Remark 1.7. There are natural generalizations of the Ti inequality often 
denoted a — Ti inequality, where a is a non negative convex lower semi 
continuous function vanishing at 0. We say that the probability measure // 
satisfies a — Tp{C) if for any probability measure u 


< 2CH{uln). 


The usual Ti inequality is then the case where a{t) = t^. Gozlan 
has generalized Bobkov-Gotze’s Laplace transform control 11[ and Djellout- 
Guillin-Wu integrability criterion to this setting enabling to recover sub 
or super Gaussian concentration. The result of the following section can 
be generalized to this setting, however adding technical details and heavy 
notations. Details will thus be left to the reader. 


2 Transportation cost-information inequalities for 
bifurcating Markov chains 

Let {Xi,i G T) be a bifurcating Markov chain on E with T-probability 
transition P and initial measure v. For p > 1 and C > 0, we consider the 
following assumption that we shall call {Hp{C)) in the sequel. 

Assumption 2.1 {Hp{C)). 

(a) z. G Tp{C); 

(b) P{x,;-)^TpiC), VxG^ ; 

(c) Wp{P{x, •, •), P(x, •, •)) < qd{x, x), Vx, x € E and some q> 

It is important to remark that under {E[p{C)), (c) we have that there 
exists vq and ri smaller than q such that for 5 = 0,1 


Wp(Pb(x, •), Pbix, •)) < n d{x, x), Vx, xe E. 

Note also that when P{x,dy,dz) = Po{x,dy)Pi{x,dz), then these last two 
stability results in Wasserstein contraction implies {Hp{C)),{c) with q < 
(rQ+r^)^/p (using trivial coupling). We may remark also that by {Hp{C)), (6), 
Pq and Pi also satisfies (uniformly) a transportation inequality. Let us note 
that thanks to the Holder inequality, {Hp{C)) implies (Pi(C')). 

We do not suppose here that q, tq and ri are strictly less than 1, and thus 
the two marginal chains, as well as the bifurcating one, are not a priori 
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contractions. We are thus considering here both ’’stable” and ’’unstable” 
cases. 

We then have the following result for the law of the whole trajectory on 
the binary tree. 


Theorem 2.2. Let n G N and let V he the law of (-^i)i<j<T„ and denote 
N = |T„|. We assume Assumvtion \2.1\ for 1 < p <2. Then V G Tp{Cn) 
where 



f'^Ar2/p-i 


if 

q < 1 

II 

C exp ^2 — 

Pj 

if 

q=l 


C{N + l) 

•w 

( eyop{q—l)rP^ 
rP-1 ) 

if 

q > 1. 


Before the proof of this result, let us make the following notations. For a 
Polish space x-, we denote by M.i{x) the space of probability measures on y. 
For X G ^ X® := (xi, • • • , Xj). For p G let (xi, • • • , xat) G E^ be 

distributed randomly according to p. We denote by pd the law of x^®'*'^, and 
by 2 i-i the conditional law of (x 2 i,X 2 i+i) given x^®“^ with the convention 
p^o = p^, where x*^ = xq is some hxed point. In particular, if p is the 
law of a bifurcating Markov chain with T-probability transition P, then 
hl2i-l = P{xi,-,-). 

For the convenience of the readers, we recall the formula of additivity of 
entropy (see for e.g. 


37l |. Lemma 22.8). 


Lemma 2.3. Let N G N, let xi, ■ ■ • , Xn be Polish spaces and P,Q G A^i(x) 
where x = a=i Xi. Then 


N 

h{Q\v) = J2 / H{Ql^-AK.-r)Q{dx) 

i=i -’x 


where P^i-i and are defined in the same way as above. 

We can now prove the Theorem. 

Proof of the Theorem \2.2[ Let Q G M-i{E^). Assume that H{Q\V) < oo 
(trivial otherwise). Let e > 0. The idea is of course to do a conditionnement 
with respect to the previous generation, i.e. to G^-i but we will do it 
sequentially by pairs. Conditionally to their ancestors, every pair of offspring 
of an individual is independent of the offspring of the other individuals for 
the same generation. Let i be a member of generation Gj_i, and define for 
a realization x on the tree Tj(x) := (xi,..., X|t^|). By the definition of the 
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Wasserstein distance, there is a coupling 7r^2i-i ^ 2 i-i of (Qy 2 i-i such 

that 


jA.i :— 
< 
< 
< 


j {d{y2i,X2i)^ + d{y2i+i,X2i+if)dTry2i-i,^2i-i 
il + e)W^ (^Qi2i-uVl,2i-iY 
(1 + e) [w;^ (Q;2.-i,ip;2.-i) + w;^ (vi2i-i,vi2.-i 
(1 + e) [w^ , p {yu ; •)) + W^(^P (yi, ■,-),P {xi, ; •) 


where the second inequality is obtained thanks to the triangle inequality for 
the Wp distance and the equality is a consequence of the Markov property. 
By Assumption 12.11 and the convexity of the function x x^, we obtain, 
for a, 5 > 1 such that 1/a + 1/6 = 1, 

A < (1 + e) 2 C'i?i(y 2 *-i) + qd{yi, Xi)^ 

< (1 + e) (a?’-' (^Y‘^CH,{y^^-^)J + bP-\PcF{yi,x,Yl 

where Hi{y'^^-^) = i^(Q^ 2 i-i |^* 2 i-i)- By recurrence, it leads to the hnite- 
ness of p-moments. Taking the average with respect to the whole law and 
summing on i, we obtain 


|T„_i| 

^ E(A 

i=0 


|T„_i| 


|T„_ 2 | 


< 


(1+e) aP-^{2Cf^^ Yl Y • 


2=1 


2=0 


Letting e goes to O"'', we are led to 


|T„_i| 

^ E(A 

i=0 


N / \Tr^- 2 \ 

< Y {2Cf^^E ) + hP-^qP Y 


2 = 1 


2=0 
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Iterating the latter inequality, increasing some terms and thanks to Holder 
inequality, we obtain 


|T„_i| N 

^ E(A) < ^ 


2 = 0 


2=1 




N N-i 




< 



p/2 


N j N-i 

E E 

i=l \j=0 



2-P 

2 


where hi = aP~^{2C)P/^'E[Hi{Y^~^y/‘^]. By the definition of the Wasserstein 
distance, the additivity of entropy and using the concavity of the function 
X I— >■ for p € [1,2], we obtain 


Wp^{Q,Vy < aP-^ {2CH{Q\V))P/^ 


( 


N N-i 


2 

2-p 


2-p 

2 


E E 

i=l \j=0 


N-l 


< 


aP-^ {2CH {Q\V))P/^ N^-2 ^ {bP-^qPy 

j=0 


When g < 1, we take b = q~^, so that bP~^qP = r < 1 and the desired result 
follows easily. When g > 1, we take 6=1 + 1/N and the results follow from 
simple analysis and this ends the proof. □ 

Remark 2.4. For g < 1, we then have that the constant Cn of Ti inequality 
for V increases linearly on the dimension N. However, for T 2 this constant 
is independent of the dimension as in the i.i.d. case. 

Remark 2.5. As we will see in the next section, still when g < 1, Theorem 
I2.2l and Theorem ll. 51 applied to F{Xi, ■ ■ ■ , A^r) = (l/N) Yli^i fi^i) (where 
/ is a Lipschitzian function defined on E) gives us deviation inequalities 
with a good order of N. But, when they are applied to F{Xi, ■ ■ ■ ,Xj\i) = 
/(Ajv), deviation inequalities that we obtain does not furnish the good or¬ 
der of N when N is large. The same remark holds when F(Xi, • • • , Xj\f) = 
g{Xn, X 2 n, X 2 n+i) with u G {1, • • • , {N — iV[2])} and g a Lipschitzian func¬ 
tion defined on . As this last question is important for the L^-transportation 
cost-information inequality of the invariant measure of a bifurcating Markov 
chain, we give the following results. 
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Proposition 2.6. Under {Hi[C)), for any n S T and x £ E 


C{Xn\Xi=x)eTi{Cn) 


where 



k=0 


and for all k £ {1, ■ ■ ■ ,rn — 1}, Ok is the number of ancestor of type 1 of Xn 
which are between the rn — k + 1-th generation and the r^-th generation. 

Before the proof, we introduce some more notations. Let n € T. We 
denote by (zi,--- ,Zr„) £ {0,1}'’" the unique path from the root 1 to n. 
Then, for all i £ {1, • • • , rn}, Zi is the type of the ancestor of n which is in 
the f-th generation and the quantities defined in the Proposition 12.61 are 
given by 



For all k £ {1, • • • , we denote by and P ^ the iterated of the tran¬ 
sition probabilities Pq and Pi defined by 



Proof of the Provosition \2.b\ First note that since 



condition (c) oi {Hi{C)) implies that 


\\Pbf\\Lip<rMLip V6g{0,1}. 


Now let / be a Lipschitzian function defined on E. By (b)-(c) of (Pi(C')) 
and Theorem 11.51 we have 



Once again, applying Theorem 11.51 we obtain 
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By iterating this method, we are led to 


< exp 




_)_ 7.2 2 

W ' Zrn' Zrn-l 


+n^ 

i=2 



Since 


rn-l 


1 I 2 , 2 2 

l+r. +r, r 

^Tn ■ 


U -1 + - • -+114 = E and p--"+V = 


i=2 k=0 

we conclude the proof thanks to Theorem 11.51 □ 

The next result is a consequence of the previous Proposition. 
Corollary 2.7. Assume (Hi{C)) and r := max{ro,ri} < 1. Then 
C{Xn\Xi = x) G ri(coo) and £((X„, ^2^,^2n+i)|^i = x) G Ti{c^) 


where 

c:^ = <^ (1 +) • 

Proof. That C{Xn\Xi = x) G Ti{coo) is a direct consequence of Proposition 
ESI It suffices to bound tq and ri by r. 

In order to deal with the ancestor-offspring case (X„, X 2 n, ^ 2 n-i-i)j we 
do the following remarks. 

Let / : —)• M be a Lipschitzian function. We have 

\f f{x,y,z)Pix,dy,dz) - f f{x,y, z)P{x,dy,dz)\ 
d{x, x) 

Thanks to condition (c) oi {P[i{C)), we have the following inequalities 


||^/||Lip= sup 
x,x£E 


j f{x,y,z)P{x,dy,dz) - j f{x,y,z)P{x,dy,dz) 

< Wfhip (d{x, x) + wf 1 (P(x, ■),P{x, •))) 

< (g + l)ll/l|Lipd(x,x), 


and then, 

We recall that Xi 


II-P/IIliP < (9 + l)||/||Lip- 
X. We have 


E[exp(/(X„,X2„,X2„+i))] = P'^-{Pef{x)). 
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Now, from (i?i(C)), the previous remarks and using the same strategy as 
in the proof of Proposition 12.61 we are led to 


E[exp{fiXn,X2n,X2n+l))] 

< exp \ Pzi--- Pzr„ Pf{x) + 


2\\f\\2 rn-l 


cWfWh, , C{i + qy\\frup 


+ 


E 


2i 

r 


i=0 

1 ^2i ^ 1 //1 fr‘2\ 


SinceP^i ■■■P^^^Pf{x) = E[f{Xn,X2n,X2n+i)] andX;[^o < l/(l-r2), 
we obtain 

E [exp {f{Xn,X2n,X2n+l))] < exp (E [f{Xn,X2n,X2n+l)] + C^oo) 
with c'^ given in the Corollary. We then conclude the proof thanks to 


Theorem 11.5 


□ 


3 Concentration inequalities for bifurcating Markov 
chains 

3.1 Direct applications of the Theorem 12.21 

We are now interested in the concentration inequalities for the additive 
functionals of bifurcating Markov chains. Specifically, let V G N* and I be 
a subset of {1, • • • Let / be a real function on E or . We set 

i&I 

where A* = W if / is defined on E and A* = {Xi, X 2 i, X 2 i+i) if / is 
dehned on E^. We also consider the empirical mean Mj{f) over I defined 
by Mj{f) = {l/\I\)Mj{f) where |/| denotes the cardinality of I. In the 
statistical applications, the cases N = |T„| and I = Gm (for m G {0, • • • , n}) 
or / = T„ are relevant (see for e.g. @]). 

First, we will establish concentration inequalities when / is a real Lip- 
schitzian function defined on E. For a subset I of {I,-- - ,N}, let Ej be 
the function defined on {E ^p > 1 by Ej{x^) = 1/(|1|) Xlie/ 
all x^ G E^ . Then Fj is also a Lipschitzian function on {E^ , di^) and we 
have \\FI\\L^p < \I\-^/P\\f\\Lip. The following result is a direct consequence 
of Theorem 12.21 
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Proposition 3.1. Let iV S N* and let V he the law of (Xj)i<j<Ar. Let f be 
a real Lipschitzian function on {E,d). Then, under {Hp{C)) for 1 < p <2, 

VoFf^GTpiCMir^^^fWhp) 

where Cn is given in the Theorem \2.^ and V o Ff^ is the image law of V 
under Fj. In particular, for all t > 0 we have 


F{Fi{X^) < -t + E [F/(X^)]) WF{Fi{X^) > t+ E [Fi{X^)]) 


< exp 


i2|j|2/p \ 

2CN\\f\\hp) ■ 


Proof. The first part is a direct consequence of Theorem 12.21 and Lemma 2.1 
of [20|. The second part is an application of Theorem 11.51 □ 


For the next concentration inequality, we assume that / is a real Lips¬ 
chitzian function defined on {F‘^,dif), which means that 


3 

\f{x)-f{y)\ < WfUipY^ d{xi,yi) '^x,yeF^. 

i=l 


We assume that is a odd number. Let I be a subset of {1, • • • , {N — 
l)/2}. Now, we denote by Fj the real function defined on [F^,di^) by 
Fj{x^) = (1/|/|) X].gj/(xj,X 2 i,X 2 i+i). For all x^,y^ G F^ we have for 
some universal constant c 


\Fi{x^) - Fi{y^)\ < ^^^^^^{d{xi,yi) + d{x2i,y2i) + d{x2i+i,y2i+i)) 


< 


iei 

cWfWup, „,N\ 


|/|1/P 


diAx^y^'^). 


Fj is then a Lipschitzian function on (F^,di^) and HF/HLip < c||/||Lip/|/|^/P. 
We then have the following result. 


Proposition 3.2. Let N G N* be a odd number and let V be the law of 
(Xj)i<j<Ar. Let f be a real Lipschitzian function on [F‘^,di.f). Then, under 
{Hp{C)) forl<p<2, 

VoFf^GTpicCMir^/^fWhp) 


14 





where Cn is given in the Theorem \2.^ and V o Fj ^ is the image law of V 
under Fj. In particular, for all t > 0 we have 


¥{Fi{X^) < -t + E[Fi{X^)])wF{Fi{X^) > t+ E [Fi{X^)]) 

t2\I\2/P 

~2cCM\hp 

Proof. The proof is a direct consequence of Theorem 12.21 Lemma 2.1 of 
and Theorem [T31 □ 




Remark 3.3. The previous results applyed with p = 1 to the empirical means 
M(Q^{f) and XI'^^{f) (/ being a real Lipschitzian function) give us relevant 
concentration inequalities, that is with the good order size of the index set, 
when q < 1. For example, for M(Q^{f) , it suffices to take N = |T„| and 
I = Gn in the Propositions 13.1] and But for q > 1, the concentration 
inequalities obtained thanks to these results are not satisfactory. In the 
sequel, we will be interested in obtaining relevant concentration inequalities 
for the empirical means M^^^f) and Mj^{f) when q >1. 


3.2 Gaussian concentration inequalities for the empirical means 

Mg^U) and Mt„(/) 

Throughout this section, we will focus only in the case p = 1, and will 
assume {Hi{C)). We set r = tq + ri. 


The main goal of this subsection is to broaden the range of application of 
deviation inequalities of MG„{f) and M'i,,{f) to cases where r > 1, namely 
when it is possible that one of the two marginal Markov chains is not a 
strict contraction. The transportation inequality of Theorem 12.21 is a very 
powerful tool to get deviation inequalities for all lipschitzian functions of the 
whole trajectory (up to generation n), and may thus concern for example 
Lipschitzian function of only offspring generated by Pq or Pi. Consequently, 
to get ’’consistent” deviation inequalities, both marginal Markov chains have 
to be contractions in Wasserstein distance. 

However when dealing with MG„if) or Mt^(/), we may hope for an av¬ 
eraging effect, i.e. if one is not a contraction and the other one a strong 
contraction it may in a sense compensate. Such averaging effect have been 
observed at the level of the LLN and CLT in I 


16| but only asymptoti¬ 


cally. Our purpose here will be then to show that such averaging effect will 
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also affect deviation inequalities. 


We will use, directly inspired by Bobkov-Gotze’s Laplace transform con¬ 
trol, what we call Gaussian Concentration property: for k > 0, we will say 
that a random variable X satisfies GC{k) if 

E [exp (t (X - E [X]))] < exp /2) Mt G M. 

Using Markov’s inequality and optimization, this Gaussian concentration 
property immediately implies that 

E(X -E(X) > r) < e-^. 


We may thus focus here only on the Gaussian concentration property {GC). 


Proposition 3.4. Let f be a real Lipschitzian function on E and n G N. 
Assume that {Hi{G)) holds. Then MiQ^{f) satisfies GG{'yn) where 


2C||/||L^ 

|G„| l-rV2 J 

|G„| 


if r 

if r = y/2. 


We recall that here r = rQ + ri. 

Remark 3.5. One can observe that for r < y/2, the previous inequalities are 
on the same order of magnitude that the inequalities obtained thanks to 
Proposition 13.11 with q <\. For r < 2 the above inequalities remain relevant 
since we just have a negligible loss with respect to |G„|. But for r > \/2, 
these inequalities are not significant (see the same type of limitations at the 
CLT level in 0). 

Proof. Let / be a real Lipschitzian function on n G N and t G M. We 
have 


E 


X E 


exp ( t2 ” ^ f{Xi) 

V i^Gn 




exp(t2- iPo + Pi)fiXi) 

iGGin—l 


exp|t2-" ^ {f{X2i) + f{X2i+i)-{Po + Pi)f{X,)) 

iGGn—1 


J^n-l 
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Thanks to the Markov property, we have 


E 


exp 


t2- 


Y, {f{X2i)+f{X2^+l)-{Po + Pl)f{Xi)) 


iEGn —1 


Xn-l 


= n P(exp(t2-(/©/-(Po + A)/)))(X,) 

iEGn—1 


where / © / is the function on dehned by / © f{x,y) = f{x) + f{y). 
We recall that from {Hi{C)) we have P{x, •, •) G Ti(C') for all x £ E. Now, 
thanks to Theorem [Ea we have 


n P(exp(t2-"(/©/-(Po + ^i)/)))(X.) 
1 


< Yi exp 

iEGn —1 


2 X 22^ 


Since ||/ © fWup < 2||/||Lip, we are led to 

'22t22-iC||/||i,; 


E 


exp ( t2 ^ f{Xi 

V ieGn 


< exp 


X E 


2 X 22« J 

exp|t2-" J] {Po + Pi)fiX,) 
"iGGn —1 


Doing the same for E[exp(t2 "■ Z]iGG„_i (-f’o + Pi)fiXi))] with (Pq + Pi)f 
replacing / and using the inequality 


II (^’o + Pi)f e (^0 + Pi)f hip < 2r\\f\hp, 


we are led to 


E 


exp (t2 ” ^ f{Xi) 


i^G-n 


< E 


exp(t2-" iPo + Pi)hiXi) 

iGGn — 2 


2 r\n—\ 


X exp 


2"t^Cl||/||i,^2^ 


2 X 22^ 


exp 


2 X 22^^ 

Iterating this method and using the inequalities 

II (Po + Pi)y ® [Po + Pi)h\\Lip < 2r'^||/||Lip VA: G {1, • • • , n - 1}, 
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we obtain 


E 


exp f{Xi) 




< exp 


2 X 22’" 


n—1 

E 

k=0 


^2k2'n—k—l 


xE[exp(t2--(Po + Ar/(^i))] . 

Since E [t2-(Po + PiTfiXi)] = E [t2- XigG„ = t2-z^(Po+^i)”/, 

we obtain 


E 


exp t2- f{X,) - iy{Po + PiTf 

V ViGGn / / 


< exp 


2 X 22” 


k=0 


X E [exp (t2-" ((Po + PiTfiXi)) - u{Po + PiTf)] • 
Thanks to {Hi{C)), we conclude that 


E 


exp t2- f{X,) - uiPo + PiTf 

\ ViGGn / 

< exp 




2 X 22"" 


k=0 


and the results of the Proposition then follow from this last inequality. □ 


For the ancestor-offspring triangle {Xi,X 2 i, X 2 i+i)-, we have the following 
result which can be seen as a consequence of the Proposition 13.41 

Corollary 3.6. Let f be a real Lipschitzian function on and n G N. 
Assume that {Hi{C)) holds. Then MiQ^{f) satisfies GC{'y!^) where 


2C(l+gF||/|lL, 

l-rV2 j 

2C(l+gF||/llL,(n+2) 

|Gn| 


if r ^V2 
if r = V2. 


Proof. Let / be a real Lipschitzian function on n G N and t G M. We 
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have 


E 


exp fiXi,X 2 i,X. 


2i+l; 


X E 


= E 


exp f t 2 ” ^ 

\ i^Gn 


exp t 2 -^ Y 1 ifiX^,X 2 ^,X 2 i+l) - Pf{Xi)) 

\ iGGn / 


T 

•J n. 


By the Markov property and thanks to the Proposition 12.21 and the Theorem 
[T 31 we have 


E 


exp t 2 -^ Y. U{X,,X 2 i,X 2 i+i) - PfiXi)) 

\ i^Gn / 


Tr, 


'^^^^11/11^^2- 

2 X 22’" 


< exp 


Now, using Pf instead of / in the proof of the Proposition 13.41 and using 
the fact that \\Pf\\Lip < (1 + Q)\\f\\Lip and 


E 2-" ^/(Xi,X2i,X2*+i) 

i^Gn 

we are led to 


= E 


2- Pf{X, 




= 2-^i^iPo+Pirpf, 


E 


exp t 2 -^ /(^o X2i,X2^+l) -u{Po + PiT Pf 

V \iGG 71 / 

'At^Cil + qn/fu^ 


< exp 


22 X 2” 


E T 


k=-l 


2\ 


The results then follow by easy calculations. 

For the subtree T^, we have the following result. 


□ 


Proposition 3.7. Let f be a real Lipschitzian function on E and n G N. 
Assume that {Hi{C)) holds. Then Mt„(/) satisfies GC{Tn) where 


2C||/||L^ ^ 1-{P/2Y +^ 


T„. = 


(r-l)^|T„| l-H/2 

+ + 

‘^^Wflllip (\’TP I n+l\ 

|T„p 2 ) 


if r / y/2, r / 1 

if r = y/2 

if r = 1. 
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Proof. Let / be a real Lipschitzian function on E and n € N. Note that 


E 


E 


= V 


Y^{P, + P,rf]. 


\rn=0 


We have 


E 


exp 


|T„I; 


E/(^‘ 


iGT„ 


X exp 


= E 


exp 


|Tnl . 


E /(x: 


*eT„_2 


|Tnl . 


E {f + {Po + Pi)f){X.) 


iGGn—l 


xE 


t 


exp 


|Tnl . 


E {fiX2i) + f{X2^+l)-iPo+Pl)f{Xi)) 


iGGr-n—l 

As in the proof of Proposition 13.41 we have 


Xn-l 


E 


E ifiX2^)+fiX2^+l)-iPo + Pl)fiX,)) 


< exp 


Xn-l 

2W\\f\\l.2-^ 


2|T, 


This leads us to 
/ t 


E 


xE 


exp 


|T, 


E/(^i 


iGT„ 


< exp 


2 |T„|2 


exp I E (/ + (^0 + ^l)/)(^i) 

”■ *€T „_2 / \ ^ i^Gn — l 


Iterating this method, we are led to 


E 


exp 


|Tnl 


E 


iGTn 


')2j.2/~t|| .c\\2 n—1 / k 


<exp|LfS^EIE-''l 2 


I \ nn—k—1 


2|T, 


k=0 \i=o 


X E 


V I m=0 / 
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and we then obtain thanks to (a) of {Hi(C)) and Theorem 11.51 


E 


exp 








^m=0 
12+2 


< exp 

In the last inequality we have used 




2|T, 


E 2 


,n—k—l 


k=0 \l=0 


^(Po+Ar/ 


m=0 


< 


E'-u 


Lip 


\k=0 


The results then easily follows. 


□ 


For the ancestor-offspring triangle we have the following results which 
can be seen as a consequence of the Proposition 13.71 

Corollary 3.8. Let f be a real Lipschitzian function on and n G N. 
Assume that {Hi{C)) holds. Then satisfies GC{t^) where 


23c(l+g)2||/||| 


Lip 

2^C{l+qfi\\f\\l.^ 

ra 

23C(l+g)2||/||i.^ 


1 + 


h-1)^ 


1 + 




1 + 


l-|-r^(n+l) 

(r-lfi 


ItTF 


(2|T„| - 


if r = V2 


ifr = l. 


Proof. Let / be a real Lipschitzian function on and n G N. By Holder 
inequality and using the fact that 


E 


E 


LgT„ 


= E 


E 


.i&fn 


we have 
E 


exp 




VjGT„ 


E 


Ugt„ 


< E 


X E 


exp 


exp 


' 2t 

2t 




\ieTn 


1/2 


Pf{Xi) - E 


Lgt„ 


1/2 
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We bound the first term of the right hand side of the previous inequality by 
using the same calculations as in the first iteration of the proof of Corollary 
13.61 We then have 


E 


exp 


2t 


J](/(Ai)-P/(W)) 

'vieTn / 


1/2 


< exp 


'2t^C\\f\\l^\Tn\ 

2|T„|2 


For the second term, we use the proof of the Proposition l3.7l with Pf instead 
of /. We then have 


E 


exp 


2t 


^Pf{X,)-E 


\ieTn 


< exp 


E 


JgT„ 


1/2 




I 


i i on—k—1 


2|T„ 


fc=0 \i=o / 

The results then follow by easy analysis and this ends the proof. □ 


3.3 Deviation inequalities towards the invariant measure of 
the randomly drawn chain 

All the previous results do not assume any ’’stability” of the Markov chain 
on the binary tree, whereas for usual asymptotic theorem the convergence 
is towards mean of the function with respect to the invariant probability 
measure of the random lineage chain. To reinforce this asymptotic result 
by non asymptotic deviation inequality, it is thus fundamental to be able to 
replace for example E(M'f„(/)) by some asymptotic quantity. This random 
lineage chain is a Markov chain with transition kernel Q = {Pq + Pi)/2. 
We shall now suppose the existence of a probability measure vr such that 
ttQ = TT. We will consider a slight modification of our main assumption and 
as we are mainly interested in concentration inequalities, let us focus in the 
p = I case: 

Assumption 3.9 {P[[{C)). 

(a) u G ri(C); 

(b) Pb{x, •) G ri(C'), Vx G S, 6 = 0,1 ; 
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(c) Wf{P{x, P{x, < qd{x,x), Vx,x G E and some q > 0. And 
forro,ri > 0 such thatro+ri < 2, forb = 0, 1, Wi{Pb{x, ■),Pb{x, •)) < 
rb d{x, x), Vx, X G E. 


Under this assumption, using the convexity of Wi (see [37|), we easily 
see that 

Wi{Q{x, •), Q{x, •)) < (x, x), Vx, X 


ensuring the strict contraction of Q, and then the exponential convergence 
towards vr in Wasserstein distance, namely (assuming that vr has a first 
moment) 


fUi(Q"(x,-),vr) < 


/ ?’o + ?■ A 

V 2 J 


d(x,y)7r(dy). 


Let us show that we may now control easily the distance between IE(Mt„ (/)) 
and 7r(/). Indeed, we may first remark that 


E 


E 

fcGGn 


= KA + A)"/ 


so that assuming that / is 1-lipschitzian, and by the dual version of the 
Wasserstein distance 


|lE(MTj/))-7r(/)| = 


< 


< 


< 






J—1 \/cGGt 




Po + Pi 




I n\ 



Ea ppy 

3=^ 


• — 


^ ^ m+ri 

^ n 
2'^+! 


if ro + ri / 1 
if ro + ri = 1 


for some universal c, which goes to 0 exponentially fast as soon as rg+ri <2 
which was assumed in {E{[(C)). We may then see that for r > 


P (Mt. (/) - <f) > r) < P (Mt„ (/) - E(Mt„ if)) >r- c„) 


and one then applies the result of the previous subsection. 
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4 Application to nonlinear bifurcating autoregres¬ 
sive models 


The setting will be here the case of the nonlinear bifurcating autoregressive 
models. It has been considered as a particular realistic model to study cell 
aging 3^, and the asymptotic behavior of parametric estimators as well 
as non parametric estimators has been considered in an important series of 
work, see e.g. Q, 0,0,0, i [ii 0,0, [H, [13 0 (and for example in the 
random coefficient setting in la]). 


We will then consider the following model where to simplify the state 
space E = M, where C{Xi) = /tq satisfies Ti and we recursively define on 
the binary tree as before 


f —fo{^k)+£2k ..N 

1 -^2fc+l = fl{Xk) + e2A:+l 

with the following assumptions: 

Assumption 4.1 (NL). /o and fi are Lipschitz continuous function. 

Assumption 4.2 (No). {ek)k>i o,re centered i.i.d.r.v. and for all k >0, 
have law /ig and satisfy for some positive 5^, p-e < oo. Equivalently, 

pLe satisfies Ti(C'e). 

It is then easy to deduce that under these two assumptions, we perfectly 
match with the previous framework. Denoting Pq and Pi as previously, we 
see that {H[) is verified, with the additional fact that P = Pq (8 > Pi. We will 
do the proof for Pq, being the same for Pi. The conclusion follows for P by 
conditional independence of X2k and X2k+i- Let us first prove that Po(x, •) 
satisfies Pi. Indeed Po(x,-) is the law of fo{x) + S 2 k, and we have thus 
to verify the Gaussian integrability property of Theorem 11.61 To this end, 
consider xq = f{x), and choose 5^ of condition (No) to verify the Gaussian 
integrability property. We have thus that Pq satisfies ri(C'p). 

We prove now the Wasserstein contraction property. Po(a;, •) is of course the 
law of fo{x) + £k- Here £k denotes a generic random variable and thus the 
law of Po{y, •) is the law of fo{y) + £k and an upper bound of the Wasserstein 
distance between Po(x, •) and Po(y, •) can then be obtained by the coupling 
where we really choose the same noise £k for the realization of the two 
marginal laws so that 
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Let / be any Lipschitz function such that ||/||Lip < 1 


/ f{z)Po{x,dz)- [ f{z)Po{y,dz] 
Js Js 


= ]E[/(/o(x)+ei) -/(/o(2/)+ei)] 

< ll/l|Lipl/o(a:) -/o(y)|. 


By the Monge-Kantorovitch duality expression of the Wasserstein distance, 
one has then 


Wi{Po{x,-),Po{y,-)) < \ fo{x) - fo{y)\ < \\fo\\Lip\x-y\. 


Thus under (NL) and (No), our model hts in the framework of the pre¬ 
vious section with q = \\fo\\Lip + \\fi\\Lip, ro = ||/o||Lip and n = ||/i||Lip- 
We will be interested here in the non parametric estimation of the autore¬ 
gression functions /o and /i,^nd we will use Nadaraya-Watson kernel type 
estimator, as considered in [9|. Let K he a kernel satisfying the following 
assumption. 

Assumption 4.3 (Ker). The function K is non negative, has compact sup¬ 
port [—R,R], is Lipschitz continuous with constant and such that 

f K{z)dz = 1. 

Let us also introduce as usual a bandwidth hn which will be taken to 
simplify as hn '■= |T„|“" for some 0 < a < 1. The Nadaraya-Watson 
estimators are then dehned as for x G M 


h,n{^) ■= 






kein 


Xu- X 


X2k 


\T^n\hr. 


fcgT„ 


K 


Xu- X 


■ = 


\T^n\hn 


Ex 


keTn 


Xk-x 

hn. 


X2k+1 


\^n\hr. 


E‘< 


keln 


Xk-x 

hn 


Let us focus on /o, as it will be exactly the same for fi and hx x G M. We will 
be interested here in deviation inequalities of fo^n{x) with respect to /(x). 
One has to face two problems. First it is an autonormalized estimator. It will 
be dealt with considering deviation inequalities for the numerator and de¬ 
nominator separately and reunite them. Secondly (x,y) —)• K{x)y is in fact 
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not Lipschitzian in general state space, so that the result of the previous sec¬ 
tion for deviation inequalities of Lipschitzian function of ancestor-offspring 
may not be applied directly. Let us tackle this problem. By definition 


fo,ni^) - fi^) = 


\T^n\hr, 


Y.'< 


k€Tn 


Xk-x 


[fo{Xk) - fo{x) -I- S2k] 






k^'Tn 


hn 


Nn + Mn 


Dr, 


where 


fceTn 


(Xk-x\ 
\ hrr J 


[fo{Xk) - foix)], 


Mn ■='^K 

kGTn 


/ Xk-x 

V 


^2k: 


D 


n 


T.x 


( Xk-x \ 

\ K ) 


Denote also Nn = Nn/{\Tn\hn), Mn = Mn/(|T„|/ln), Dn = Dn/{\Tn\hn). 
Let us remark that Dn and Mn completely enter the framework of Proposi¬ 
tion 13.71 We may thus prove 


Proposition 4.4. Let us assume that [NL), [No) and (Ker) holds, and 
Q = ll/olUip + ll/ilUip < \/2. Let us also suppose that a < 1/4. Then for 
all r > 0 such that r > ]E(iV„)/]E(i)„), there exists constants C,C',C" > 0 
such that 

P(l/o,n(x)-/(x)| >r) < 2exp(^-C{rE{Dn)-E{Nn)f\Tn\hl^ 

{rEjDn) -E{Nn)?\Tn\hl \ 

1 + C";g- J ■ 

Proof. Remark first that, by {Ker), K is Lipschitz continuous so that y —>• 
is also lipschitzian with constant \\K\\Lip/hn- The mapping y —)• 
X{^^^){fo{y) — foix)), as K has a compact support and /o is Lipschitzian, 
is also Lipschitzian with constant ii||iL||Lip||/o||Ljp + ||/o||Lip||.fil||cxD- We can 
then use Proposition [321 to get deviation inequalities for Dn- For all positive 


-1-2 exp ( —C' 
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r there exists a constant L (explicitly given through Proposition I3.7p . such 
that 


¥{\Dn - E{Dn)\ > r\Tn\hn) < 2exp {-Lr^\TnK/\\K\\l,p) . 

For Nn + Mn we cannot directly apply Proposition 13.71 due to the successive 
dependence of Xk at generation n and £ 2 k of generation n — 1. But as we 
are interested in deviation inequalities, we may split the deviation coming 
from each term. For Nn, it is once again a simple application of Proposition 

[3^ 


P(|iV„-E(lV„)| > r\Tn\hn) < 2exp 


-Lr'^\Tn\hl 


{R\\K\\Lip\\fo\\up + \\fo\\Lip\\K\\^y 


Note that £ 2 k is independent of X^, and centered so that E(M„) = 0, 
and satisfies a transportation inequality. Note also that K is bounded. By 
simple conditioning argument, we may control the Laplace transform of 
quite simply. We then have for all positive r 


P(|M„| > r\Tn\hn) < 2exp 


[_ 2 \'^n\hl\\K\\l \ 

V 2 Ce J 


However, we cannot use directly these estimations as the estimator is 
autonormalized. Instead 


it® (/o,n(a;) - f{x) > 

< P(iW + Mn > rDn) 

<^{Nn- E(iV„) - r{Dn - E(^n)) + Mn> rE(Z)„) - E(iV„)) 

< P - E(iV„) - r{Dn - nDn) > {rnDn) - E(lV„))/2) 

+P [Mn > {rnOn) - E(iV„))/2) 


Remark now to conclude that K{{y — x)/hn){f{y) — f{x)) + K{{y — x)/hn)\s 
{R\\K\\Lip\\fQ\\Lip + ||/o||i,ip||-?ii||oo + rill'll Lip//in)-Lipschitzian, and we may 
then proceed as before. □ 

Remark 4.5. In order to get fully practical deviation inequalities, let us 
remark that 


E 



1 


y 2™/roQ™i^ ^ i^{x) 

< ^ n^-l-oo 


27 







where H{y) = K{{y — x)/hn), v{-) is the invariant density of the Markov 
chain associated to a random lineage and 


E 




1 


^ 2 ™ {yoQ^{Hfo)-Mx)yoQ^ 


'■H) 0. 

n—>-H-cx) 


We refer to for quantitative versions of these limits. 

Remark 4.6. Of course this non parametric estimation is in some sense 
incomplete, as we would have liked to consider a deviation inequality for 
sup^, l/o,n(ic) — fo{x)\. The problem is somewhat much more complicated 
here, as the estimator is self normalized. However, it is a crucial problem 
that we will consider in the near future. For some ideas which could be 
useful here, let us cite the results of for (uniform) deviation inequal¬ 
ities for estimators of density in the i.i.d. case, and to for control of 
the Wasserstein distance of the empirical measure of i.i.d.r.v. or of Markov 
chains. 


Remark 4.7 (Estimation of the T-transition probability). We assume that 
the process has as initial law, the invariant probability v. We denote by / the 
density of {Xi, X 2 , X^). For the estimation of /, we propose the estimator 
fn dehned by 


fnix,y,z) 


1 f X - Xk \ ( y - X2k \ j. f z - X2k+i \ 

\Tn\K^f[ hn ) [ K ) [ K ) 


An estimator of the T-probability transition is then given by 

fn{x,y,z) 


Pn {x, y, z) = 


Dr. 


For x, y, z G M, one can observe that the function G defined on by 


G{u, v,w) = K 


X — u 
hn 


K 


y -V 

hn 


K 


z — w 
hn. 


is Lipschitzian with ||G||Lip < (||Ar||^||Ar||ijp)/h„. We have 

s ( \ ^ fnix,y,z) - f{x,y,z) f{x,y,z){u{x)-Dn) 

Pn {x,y,z) - P[x,y,z) = -^-H 


Dr 


h'{x)Dr 


Now using the decomposition 


fn{x, y, z) - f{x, y,z) = (fn{x, y,z)-E /„(x, y, z) 


+ E 


fnix,y,z) -f{x,y,z 
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and the convergence of E 


fn{x,y,z) 


to f{x,y,z), we obtain a deviation 


inequality for |P„ (x, y, z) — P{x, y, z)\ similar to that obtained at the Propo¬ 
sition oi 

When the density of ( 62 , 63 ) is known, another strategy for the esti¬ 
mation of the T-transition probability is to observe that P{x, y, z) = g^{y — 
fo{x),z — fi{x)). An estimator of P{x,y,z) is then given by Pn{x,y,z) = 
9e{y — fo,nix), z — fi^n{x)) where /o,n and fi^n are estimators dehned above. 
If ge is Lipschitzian, we have 


Pn {x,y,z) - P{x,y,z)\ < WgeUip 




and the deviation inequalities for \Pn{x,y,z) — P{x,y,z)\ are thus of the 
same order that those given by the Proposition 14.41 
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